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Abstract We consider two classes of constrained finite state-action stochas- 
tic games. First, we consider a two player nonzero sum single controller con- 
strained stochastic game with both average and discounted cost criterion. We 
consider the same type of constraints as in [Tj, i.e., player 1 has subscrip- 
tion based constraints and player 2, who controls the transition probabilities, 
has realization based constraints which can also depend on the strategies of 
player f. Next, we consider a A^-player nonzero sum constrained stochastic 
game with independent state processes where each player has average cost cri- 
terion as discussed in pi . We show that the stationary Nash equilibria of both 
classes of constrained games, which exists under strong Slater and irreducibil- 
ity conditions [3], [5], has one to one correspondence with global minima of 
certain mathematical programs. In the single controller game if the constraints 
of player 2 do not depend on the strategies of the player 1, then the mathe- 
matical program reduces to the non-convex quadratic program. In two player 
independent state processes stochastic game if the constraints of a player do 
not depend on the strategies of another player, then the mathematical pro- 
gram reduces to a non-convex quadratic program. Computational algorithms 
for finding global minima of non-convex quadratic program exist [4* , and 
hence, one can compute Nash equilibria of these constrained stochastic games. 
Our results generalize some existing results for zero sum games [T], [5], [?]■ 



* A portion of Section |3] (two player case) has been presented in the 8th International 
ISDG workshop at University of Padova, Italy on 21-23 July, 2011. 
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1 Introduction 

It is well known that there is a substantial relationship between game the- 
ory and mathematical programming. While it is well known that equilibrium 
strategies in two player zero sum matrix games are related to optimal points of 
certain linear programs, in 1964, Mangasarian and Stone [8 have shown that 
the Nash equilibria of any two player bimatrix game can be obtained from the 
global maxima of one quadratic program and this approach can be generalized 
in case of any finite number of players. Later Filar et al. [2], generalized this 
idea to the infinite horizon stochastic game with finite state space and finite 
action spaces of all the players. It has been shown that the stationary Nash 
equilibria of any A^-player stochastic game with discounted criterion are in 
one to one correspondence with the global minima of a certain mathematical 
program [5], [TD]; so, Nash equilibria of such a stochastic game can be com- 
puted via the global minima of one mathematical program. The stochastic 
games described in [3], [T^ can be viewed as centralized stochastic games. 
In such centralized stochastic games all the players jointly control a single 
Markov chain and all the players have complete information of the Markov 
chain's state and for taking decision at any time t each player has information 
of all the actions previously taken by the players. The review article [111 sum- 
marizes various algorithmic aspects of zero sum stochastic games along with 
algorithms for nonzero sum stochastic games with special structure (single 
controller, etc.). In particular, two player zero sum single controller stochastic 
game can be solved by a linear program [12] , [10] , [13] and the Nash equilibria 
of the nonzero sum single controller stochastic game can be obtained from the 
global minima of a quadratic program 14 . 

Since the seminal work of Lloyd S. Shapley [TS], stochastic games have come 
to constitute an important class of models that can capture game theoretic 
issues among the decision makers involved, apart from accounting for random 
evolution of the system. The edited volume by Neyman and Sorin [THj has a 
nice collection of many articles on stochastic games and their applications. The 
book by Filar and Vrieze [TU] presents stochastic games as a natural multi- 
player generalization of (single player) Markov decision processes and their ap- 
plications. Constrained stochastic games are realistic because they can capture 
bounds on consumption of resources, but, are also difficult to analyze. In [3] 
iV-player centralized constrained stochastic games with both discounted and 
average cost criterion with finite state and finite action spaces are considered 
and it is shown that there exists a stationary Nash equilibrium under strong 
Slater condition (irreducibility assumption is also needed in average case) . The 
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existence of Nash equilibrium for constrained stochastic games when the state 
space is countable and action spaces are compact metric space is discussed in 
[17] . The characterization of Nash equilibria for general constrained stochastic 
games via some mathematical program is not known. To the best of our knowl- 
edge there are only some special classes of constrained stochastic games which 
can be solved as linear programs. We give a brief description of all these classes 
here. The two player zero sum single controller constrained stochastic game 
with total expected reward criterion and expected average reward criterion is 
considered in [TH] , respectively. In both [TB] , [5] only the player who controls 
the transition probabilities has constraints on his expected rewards and these 
rewards do not depend on the strategies of the other player. Nash equilibrium 
of such stochastic games can be computed from optimal solutions of linear 
programs. Altman, et al., [1] considered the zero sum constrained stochastic 
game with discounted cost criterion where both the players have constraints. 
The player who controls the transition probabilities has constraints on his ex- 
pected discounted costs as similar in [TH] , [5] and other player has subscription 
based constraints. This class of games also can be solved by linear programs 

Apart from the centralized stochastic games as discussed above, some de- 
centralized stochastic games are being considered in the literature recently [7] , 
[2, [IH]. In decentralized stochastic games each player independently controls 
his own Markov chain based on his state and actions. In [2j, a iV-player decen- 
tralized constrained stochastic game with average cost criterion is considered 
and it is shown that the Nash equilibrium for these games exists in stationary 
strategies under the irreducibility and strong Slater condition. In these games 
each player controls his own Markov chain and the constraints of each player 
depend also on the strategies of all the players. The application of these games 
to modeling of wireless network is described in [7], [2], [19]. Two player zero 
sum game of this class where the constraints of each player do not depend 
on the other player's strategies is considered in [?]• These games, with both 
unichain and multichain structure on the state processes of both the players, 
can be solved by linear programs. 

In this paper we consider two different classes of constrained stochastic 
games. First, we consider a special class of two player nonzero sum centralized 
constrained stochastic games which is a single controller constrained stochastic 
game with both average and discounted cost criterion. We then consider a 
iV-player nonzero sum constrained stochastic game with independent state 
processes where all the players use average cost criterion as discussed in [5]. 
The summary of our results are: 

1. We consider a two player nonzero sum single controller constrained stochas- 
tic game with both average and discounted cost criterion, a special class of 
centralized constrained stochastic games, with the same type of constraints 
as in [T], i.e., player 1 has subscription based constraints and player 2, who 
controls the transition probabilities, has realization based constraints. Un- 
like the situation in [I] and [5] we consider the case where realization based 
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constraints of player 2 depend on the strategies of both the players. It fol- 
lows from [3] that there exists a stationary Nash equilibrium under strong 
Slater condition (irreducibility assumption is also needed in average case) . 
We show that the Nash equilibria of this constrained stochastic game can 
be obtained from the global minima of one mathematical program. The 
converse statement is also true, i.e., from the stationary Nash equilibrium 
of these games we can construct a point which is a global minimum of the 
corresponding mathematical program. 

2. If the constraints of player 2 do not depend on the strategies of player 
1, then the mathematical program reduces to the non-convex quadratic 
program. For zero sum case the linear programs given in [T], [5] can be 
recovered from our quadratic program. 

3. We show that the stationary Nash equilibria of TV-player nonzero sum 
constrained stochastic game with independent state processes [2] can be 
obtained from the global minima of a certain mathematical program. The 
converse statement is also true, i.e., the stationary Nash equilibrium of 
these games, which exists under strong Slater and irreducibility conditions 
[2] , corresponds to a point which is a global minimum of the corresponding 
mathematical program. 

4. In two player constrained stochastic game with independent state processes 
case, if the constraints of each player do not depend on the other player's 
strategies, then the corresponding mathematical program reduces to the 
non-convex quadratic program. The linear program as given in [7] for zero 
sum game can be obtained as a special case of our quadratic program. 

To derive mathematical programs for both constrained stochastic games we use 
the same approach, which is via best response linear programs. We use the fact 
that the best response of each player against the fixed strategies of other play- 
ers can be obtained by solving a constrained Markov decision model, which, in 
turn, can be obtained by solving a linear program [5D]. In both the cases due 
to some special structure we are able to put all primal-dual pair of linear pro- 
grams (one pair for each player) together to form one mathematical program 
whose objective function is nonnegative at all feasible points. As the linear 
program which gives the optimal strategy in a constrained Markov decision 
model is given in terms of occupation measure, our mathematical programs 
are in terms of these occupation measures. The Nash equilibrium strategy can 
be recovered from occupation measure by a known transformation [20] . 

There are some methods available for solving non-convex quadratic pro- 
gramming problem [3] , [3T] , [32] . The algorithm given in [22] is based on com- 
plete enumeration of the faces of the polyhedron and therefore it is not very 
efficient while the cutting plane method of |2] seems to be problematic [23] , 
The algorithm given in [4] to solve quadratic programs terminates in a finite 
number of steps. We note that the algorithm of [3] assumes that quadratic 
program has a global minimum and this condition is satisfied in our settings. 
In [5] , one more algorithm based on linear programming with complementarity 
constraints approach is given to solve a non-convex quadratic program. This 
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algorithm does not assume the quadratic program to be bounded below on fea- 
sible set. (If quadratic program is not bounded below, then the algorithm given 
in [5] computes a feasible ray on which the quadratic program is unbounded; 
otherwise, it finds an optimal solution in finite number of steps). But, in our 
case the quadratic programs are bounded below on feasible set and hence 
the algorithm given in [4j is applicable to our settings. One can also attempt 
to use general purpose nonlinear solvers to solve these non-convex quadratic 
programs, but convergence to global minima may not be guaranteed. 

We now describe the structure of the rest of our paper. Section [5] contains 
the two player nonzero sum single controller constrained stochastic game with 
both average and discounted cost criteria and its mathematical programming 
formulation. Section [3] contains iV-player constrained stochastic game with 
independent state processes with average cost criterion and its mathematical 
programming formulation. 

2 Single controller constrained stochastic game 

We consider two player nonzero sum single controller constrained stochastic 
games with both average and discounted cost criterion. We assume that player 
2 controls the Markov chain. As similar to pLJ , player 1 has subscription based 
constraints and player 2 has realization based constraints but unlike the case 
in [1] , [6] the constraints of player 2 can also depend on the strategies of player 
1. This class of stochastic game is described by the following objects: 

(i) S is finite state space of the game. The generic element of S is denoted 
by s. 

(ii) 7 — (7(1), 7(2), • • • ,7 (ISD) is a probability distribution over S according 
to which initial state is chosen. 

(iii) is the finite action set of player i, i = 1,2, let A'^{s) denotes the set of 
actions available to player i when the state is at s, where A' = Uses ^*('*)- 

(iv) Define, /C = {{s,a\a'^) : s e S,a^ G A^{s),a'^ £ ^^(s)} and for i = 1,2, 
/C* = {(s,aO ■.seS,a'€ A'{s)}. 

(v) : /C R is immediate cost of player i, i = 1, 2. Specifically, c'(s, a^, a^) 
is the immediate cost incurred by player i, i = 1,2, when state is s e S' 
and actions chosen by player 1 and player 2 are G A^{s) and G ^^(s) 
respectively. Player i wants to minimize the expected cost involving c*(-), 
i = 1,2. 

(vi) dl'^i^ : /C^ R is subscription type cost of player 1. d],'^i^{s,a^) denotes 
subscription cost which player 1 has to pay for using action at state s 
for kth service, fc = 1, 2, • • • , ni. 

(vii) d^'' : R is immediate cost of player 2. These are involved in the 
^th, / = 1, 2, • • • , 77,2, constraint on expected cost of player 2. 

(viii) Define, p{M) as set of all probability measures over set M. p : IC'^ ^ p{S) 

is transition probability describing the dynamics of the game, where p(s'|s, a^) 
is a probability of going to state s' from state s when player 2 chooses action 
G A'^{s). We recall that the game is controlled by only player 2. 
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(ix) e = {ei,e2, ■■■ ,eS,e = {ei,^l ■■■ ,enf denote the vectors defining 
the given bounds of the constraints on both the players. 

The game dynamics are as follows. Initially, at time; t = 0, the state of the 
game is s which is chosen according to initial distribution 7, player 1 chooses 
an action e A^{s) and player 2 chooses an action a? e ^{s) independent 
of each other. Player 1 receives an immediate cost of c^(s,a^,n^) and player 
2 receives c^{s,a^ ^a^). Apart from this player 2 receives another immediate 
costs {d^''(s, a^, a^)}, I = 1,2, ••• ,n2, which are involved in the expected 
cost fimctionals of player 2 that are constrained by specified bounds {^f}, 
I = 1, 2, • • • ,712. Now, the state of the game switches to a new state s £ S 
at time t = 1 with probability p{s\s,a'^). At time f = 1, in state s, player 1 
chooses an action G A^{s) and player 2 chooses an action G ^^(s) and 
receives immediate cost c^(s, a^, d^) and c^(s, d^, d^) respectively, player 2 also 
receives immediate costs {d^''(s, d^, d^)}, I = 1,2, •■• ,722- The next state of 
the game is s G 5 with probability p(s|s, d^). The same thing repeats at state 
s and play continues for infinite time horizon. 

While transition probabilities depend only on the present state and action 
used, action that are used can depend on 'past', as in history dependent strate- 
gies. Define a history at time t as ht ~ {sq, aj, Oq, si, a\,a1, • • • , St-i, al_i,af_i, St) 
where St G 5*, G A^{st), i = 1, 2, t = 0, 1, 2, • • • . Let Ht denote the set of 
all possible histories of length t. A decision rule ft : Ht ^ p{A^{st)) (resp., 
gt : Ht ^ p{A'^{st))) of player 1 (resp., player 2) at time t is a function that as- 
signs to any history of length t, a probability measure over action set of player 
1 (resp., player 2). This means that under decision rule ft (resp., gt), player 1 
(resp., player 2) chooses action (resp., a^) with probability ft{ht,a}) (resp., 
gt{ht,a'^))- The sequence of decision rules is called the strategy of the player. 

= ifo, fi, - ■ ■ ,ft,---) and g^ = (go.gi, - ■ ■ ,gt, - ■ ■) denote the strategies of 
player 1 and player 2 respectively and are called history dependent (behav- 
ioral) strategies. 

Let F and G denote the set of all history dependent strategies of player 

1 and player 2 respectively. These strategies are called Markovian if at every 
decision epoch the decision rule depends only on the current state but the 
decision rule can differ at every epoch. A stationary strategy is a Markovian 
strategy which docs not depend on the time, i.e., at every decision epoch the 
decision rule is same. So, for stationary strategy ft = f and gt — g for all t, i.e., 
(/, f,f,---) and {g, g, g, ■ ■ ■) are the stationary strategies of player 1 and player 

2 respectively. We denote, with some abuse of notations, / and g as stationary 
strategies of player 1 and player 2 respectively. Let Fs and Gs denote the set 
of all stationary strategies of player 1 and player 2 respectively. A stationary 
strategy / G Fs is identified with / = ((/(l))^, (/(2))^, ■■■,(/ (|5|)n^, 

where /(s) = (/(s, 1), /(s, 2) • • • , / (s, |^i(s)l))^ for all s G S; |M| denotes the 
cardinality of set M. Similarly, g is identified with g = ((5(1))^, (51(2))^, ■ • • , 

{g{\S\)ff, where g{s) = {g{s,l),g{s,2) ■ ■ ■ ,g {s,\AHs)\)f for all s e S. 
For all s G S, f{s, a^) is then the probability of choosing action e A^{s) by 
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player 1 and g{s,a?) is probability of choosing action a? G A^(s) by player 2 
when state is s. 

This leads to the introduction of vector stochastic process {Xf , Aj , Aj }^g, 
where, Xt denotes the state of the game, Aj, the action chosen by player 1 
and Aj, the action chosen by the player 2 at time t. An initial distribution 
7 together with strategy pair {f^,g^) € F x G defines a unique probability 
measure P^^ on an appropriate probability space with respect to which the 
laws of vector stochastic process {Xt, A^, Aj}^q of states and actions can be 
defined. The corresponding expectation operator on this probability space is 
denoted by E]^ . 



The expected average cost 

These costs are average functionals of states and actions of the game and 
each player minimizes his cost functionals. For given initial distribution 7 and 
strategy pair {f^,g'^) the expected average cost of player i, i = 1, 2, is defined 
as 

1 

^^:a(7,/^/) =linisup-5]E; ,c^(Xt,Ai,A?) (1) 
where ea stands for expected average. 

The expected average constraints 

The expected average constraints of player 2 are defined by average functionals 
of states and actions of the game which are bounded by given reals. For given 
initial distribution 7 and strategy pair (/'',. g'*) the expected average costs of 
player 2 are defined as 




-^ea ('i ') can capture the average consumption of resource I, I = 1, 2, • • • ,71,2, 
by player 2. The expected average constraints of player 2 are given by 

i?e'a(7,A/)<Cf, V; = l,2,---,n2. (2) 

A constraint in ^ captures the fact that the average consumption of resource 
I by player 2, when player 1 uses strategy and player 2 uses strategy g'^ is 
not more than given constant ^j^, ^ = 1, 2, • • • , n2- 
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The expected discounted cost 

These costs are discounted functionals of states and actions of the game and 
each player minimizes his cost functionals. For given initial distribution 7 and 
strategy pair {f^^g'^) the expected discounted cost of player i, i = 1,2, is 
defined as 

00 

C^(7,/^/) = (l-/3)5]/3*E;,_^,c*(Xt,Ai,A?) (3) 
t=o 

where /? £ [0, 1) is a fixed discount factor. 
The expected discounted constraints 

The expected discounted constraints of player 2 are defined by discounted 
functionals of states and actions of the game which are bounded by given 
reals. For given initial distribution 7 and strategy pair the expected 

discounted costs of player 2 are defined as 

00 

Dfil, /^ 9") = (1 - /3) ^ l3'Ej,^^,d'^\Xu Ai, A2), V / = 1, 2, • • • , ^2. 

t=o 

•) can capture the discounted cost for the consumption of resource I, 
I — 1, 2, ■ • • ,712, by player 2. The expected discounted constraints of player 2 
are given by 

^^''(7,/",/) <^f, V/==l,2,--. ,n2. (4) 

A constraint in (jH) captures the fact that discounted cost for the consumption 
of resource I by player 2, when player 1 uses strategy and player 2 uses 
strategy g'^ is not more than given real £^f, I — 1, 2, ■ • • ,712. 

Subscription type cost JTI 

The subscription type costs of player 1 are defined as in [T] 
^L1(/) = E E d]'::,is,a')fis,a') 

for all fc = 1, 2, • • • , TT-i and / 6 F5. These costs are called as subscription 
type because they are based only on the fraction of time during which a given 
action is used at a given state and are not based on how frequently the state 
is visited and action is used. This situation can arise where for using some 
services there is subscription/registration fee for their planned use and that 
can be paid in advance. 
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Subscription type constraints 



The subscription type constraints of player 1 are defined as 



i?L1(/)<eL VA: = 1,2,.-. ,ni. 



(5) 



We denote C", i — 1,2 and D^ '', I — 1,2, ■ ■ ■ , n2, as expected costs which 
can be either average or discounted that depends on the criterion being used. 
Under average cost criterion both players have expected average costs and 
under discounted cost criterion both players have expected discounted costs. 
Apart from this, player 1 has subscription based costs which are constrained 
by some given reals. It is clear that player 1 has ni number of constraints 
which are defined by ([5]) and player 2 has n2 number of constraints which are 
defined as 



The constraints ([S]) and ^ are called subscription based and realization based 
constraints respectively. Both the players choose their actions independently 
and want to minimize their expected cost subject to their constraints from ([S]) 
and dl]). We denote this constrained stochastic game by G^. As Nash equilib- 
rium exists in stationary strategies under assumptions (Al)-(A2) given below 
[3], from now onwards we restrict ourselves to the stationary strategies. 

The strategy pair (/, g) is called 1-feasible if it satisfies ([5]) and strategy pair 
{f,g) is called 2-feasible if it satisfies ©. As the player 1 constraints ([5]) do not 
depend on the strategies of player 2, then strategy pair (/, g) is 1-feasible for all 
g e Gs if / satisfies A strategy pair {f,g) is called feasible if it is both 1- 
feasible and 2-feasible. Let F| denote the set of all feasible stationary strategy 
pairs for the constrained stochastic game C^. We shall assume throughout that 
Fg is non-empty. Now, we recall the definition of Nash equilibrium as given in 
[3]. A strategy pair {f*,g*) € Fg is called the Nash equilibrium of constrained 
stochastic game C if it satisfies the following conditions 



Thus, unilateral deviation of any player i, i = 1,2, will either violate the con- 
straints of ith player, or if it does not, it will result in a cost for that player 
that is not lower than the one achieved by feasible strategy pair {f*,g*)- The 
strategy pair {f*,g*) £ F| satisfying ^ and dH) would still be Nash equilib- 
rium of constrained stochastic game if we replace strategy g'^ by stationary 
strategy g in This can be seen by noticing that when strategy of player 1 
is fixed as a stationary strategy /*, then player 2 is faced with a constrained 
Markov decision process (CMDP) where optimal strategy always exists in the 
space of stationary strategies |20) . 



D'\j,f\g'')<^f, V/ = l,2,..- ,n2. 



(6) 



C\j,r,9*)<C\^,f,g*), V 1-feasible 
C'h,r,g*)<C'{j,f*,g''), V 2-feasible (/*,/). 



(7) 
(8) 
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Assumptions [Altman and Shiuartz 131] 

(Al) Ergodicity: In case of average cost criterion the unichain ergodic structure 
holds, i.e., under every stationary strategy g the state process is an irre- 
ducible Markov chain with one ergodic class (and possibly some transient 
states). 

(A2) Strong Slater condition: For player 2, there exists some g' such that for 
any strategy / of player 1, 

D^\lJ,9')<^l V; = l,2.-- ,n2. 

As the constraints of player 1 are linear and does not depend on the strategies 
of player 2, the strong Slater condition is not needed for the constraints of 
player 1. 

We use the following notations throughout this section. For z = 1, 2, s G 5*, 

Z = 1,2, • • ■ ,712, 

. C^ = diag(C'(l),C'(2),-.- ,CX|5|)). 

• x^(x{\f ,x{2f ,x{\S\fY . 

. x{s) ^ {x{s,l),x{s,2), - ■ ■ ,x {s,\A\s)\)f . 

. u^(u{\),u{2),--- ,u{\S\)f- 

• w G M. 

. z=(z(l),z(2),...,z(|5|)f. 
. 1„ - - ,1)^ gM". 



2.1 Single controller constrained stochastic game with average cost criterion 

In this section we consider the game described in Section [2] with average cost 
criterion where both players choose their strategies independently and mini- 
mize their expected average costs as defined in ^ subject to their constraints 
from (O, (H]). The constraints of player 1 given in ([5]) are subscription based. 
The expected average constraints (O of player 2 captures the fact that the 
average consumption of resource Z, ^ = 1, 2, • • • , n2, by player 2 is not more 
than given £]f . 

2.1.1 Average occupation measure 

For an initial distribution 7 and a stationary strategy g define the average 
occupation measure 

'^lail^g) — {7rea(7,5;s,a^) ■■ s e S,a'^ IE A^(s)] . 
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For all s e S, a'^ £ ^^(s); i'ea(7' ff! 'i^) is given by 

7r2„(7,g;s,a2) =^9(s).g(s,a2) (9) 



where tt^ — {tt^ {1),tt^ {2), ■ ■ ■ ,tt^{\S\)) is steady state distribution of Markov 
chain induced by stationary strategy g which exists and is unique under (Al). 
77^^(7, g) can be considered as a probability measure over /C^ that assigns prob- 
ability TTgaili ff; ct^) to the statc-action pair (s, a^). The occupation measure 
defined as in Q is independent from initial distribution 7, so, we drop 7 from 
the notation. For fixed strategy pair (/, 5) G x Gs the expected average 
costs of both the players can be written in terms of occupation measure as 



for alH = 1, 2, ■ • • , n2- 

Let Qea be the set of vectors x G K.''^ ' satisfying 

(m) Y x{s,o?) = l 

^ (in) x{s, a^) > 0, V s G S, G A^{s). 
S{-,-) is a Kronecker delta, i.e., 



6{s,s') 



1 if s = s', 
if s ^ s'. 



The stationary strategies are complete, i.e., set of occupation measures achieved 
by history dependent strategies equals to those achieved by stationary strate- 
gies and further equals to the set Qea [20]. It is known that for each (s, a^) G /C^, 

x{s, a?) = T^laid' <^^) where 



5(5, a^) = 



x{s, a? 



(10) 



whenever denominator is nonzero (when it is zero g{s) is chosen arbitrarily 
from p{A^{s))) [20]. 

The cost of player 1 when he uses action at state s and player 2 uses 
strategy g is given by 

c\s,a^5)= Y c\s,a\a^)7rg„(5;s,a^). 
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Similarly, the costs of player 2 when he uses action a} at state s and player 1 
uses strategy / are given by 

c2(/;s,a2)= ^ c2(s,a\a2)/(s,ai). 
d^\f-s,c?)= d^'{s,a\a^)f{s,a^), V Z = 1, 2, • • ■ , ^2. 

2.1.2 Mathematical programming formulation 

We show the one to one correspondence between the stationary Nash equilibria 
of single controller constrained stochastic game G"^ with average cost criterion 
and the global minima of a certain mathematical program. 



Be.st response linear programs 

For a given stationary strategy of one player in a two player constrained 
stochastic game, the best response of the other player is given by solving a 
constrained Markov decision model, which, in turn, can be obtained by a linear 
program in finite state-action setting [20j. For fixed strategy g of player 2, the 
best response of player 1 can be obtained from the following linear program: 



min ^ c^(s,a^;g)/(s,a^) 



(s,o1)gk;i 



s.t. 



W E dli\{s,a')f{s,a')<ek, V fc = l,2,. 

(s,ai)e/Ci 



[n 



(Hi) fis,a^) > 0, V s 6 5, e A\s). 



(11) 



The dual of ^ is 

Y^zis)-j2siek 



max 

z, 51 



.sGS 



fc=l 



S.t. 



^ (12) 



Wz(s)<ci(s,ai;5)+E'^^'^-';(s>«')> V ,s G 5, a' e A'is) 



k=l 



(m) 5fe > 0, V fc = 1,2,-- - 
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Similarly, for fixed strategy / of player 1 , the best response of player 2 can be 
obtained from the following linear program: 

min (P'{f;s,a^)x{s,a^) 

s.t. 

(i) J2 {S{s,s')-p{s'\s,a^))x{s,a'^)^0, W s' e S 

(m) ^ a:(s,a^) = l 
(s,a2)eK;2 

{lit) d^Hf;s,a^Ms,a^) <^f, y 1^1,2, ■■■ ,n2 

(iv) x(s,a2) > 0, y seS, e A'^{s). 

If X* is the optimal solution of the linear program p^ . then the best response 
strategy g* of player 2 can be obtained from ([TU)) pp] . The dual of the linear 
program is given by 



(13) 



max 

v.u 6^ 



1=1 



s.t. 



(i) z; + u{s) < c^if; s, a") + £ <5fd2,'(/; 

1=1 

+ ^ p(s'|s,a2)u(s'), V s G S", e ^^^^^^ 



y (14) 



s'es 

(m) (5f > 0, V Z = 1,2,-- - ,n2. 

We denote the decision variables and objective function of mathematical 
program [MPl] by 77 = (w, , , x^, (^^)^, ((5^)^)^ and (I>{'q) respectively. 

Theorem 'i. (a) If (/*,5*) is a Nash equilibrium of the constrained stochas- 
tic game with average cost criterion, then, there exists a vector rj* = 
(v* ,u*'^ , z*'^ , f*'^ , x*'^ , (6^*)'^ , {6'^*)'^^ such that it is a global minimum 
of mathematical program [MPl] given below 

[MPl] min [[f^C'x - [ifs^z - (STe)) + (f^C'x - - (6^^')) 
s.t. 

"2 

ii)v + uis)<\{fis)fc'{s)] +Y,Sf\if{s)fD'^\s)] 

1=1 

+ p{s'\s,a^)u{s'), V s e S*, e A^{s) 
s'es 
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{ti) z(s)<[C\s)x{s)]^,+f2Sldl'^\{s,a'), y seS, a' e A\s) 

k=l 

(m) ['^(*' - ^'(^'l'^' "^)] = y s' €S 

(iw) ^ x(s,a2) = l 

(s,ai)G/Ci 
sGS 

(viii) fis,a^) > 0, V s G 5, G ^^(s) 

(ix) x{s,a^) > 0, V s G 5, G ^^(s) 

(x) Sl>0, V fc = l,2,..- ,ni 
(a;i) 6f >0, V / = 1,2,-- - ,^2. 

wif/i <?(?7*) = 0. 

('ft; = (w*,M*'^,z*^,/*'^,a;*'^,((5i*)^,((52*)^)^ IS a g/ofoaZ minimum of 

[MPl] with ^{rj*) = 0, then, {f*,g*) is a Nash equilibrium of the con- 
strained stochastic game G"^ with average cost criterion, where 

*f 2n x*{s,a^) 

Z^a^eA^is) X (s,a ) 

for all s G S', G A'^(s) whenever the denominator is non-zero (when it 
is zero g*{s) is chosen arbitrarily from p{A'^(s))). 

Proof (a) Let {f* ,g*) be Nash equilibrium of the constrained stochastic game 
G"^ with average cost criterion. We construct occupation measure x* corre- 
sponding to g* as given in ([9]) then x* satisfies (iii), (iv) and (ix) of [MPl]. The 
strategy pair {f*,g*) is feasible because it is a Nash equilibrium, so, {f*,x*) 
satisfy {v)-{viii) of [MPl]. As /* and g* are best responses of each other, x* as 
constructed above will be optimal solution of linear program (jl3p for fixed /* 
from Proposition 3.1(m) of [3]. By strong duality theorem [21], [IS] there exists 
optimal solution {v*, u*, S'^*) of (IT4|) such that (v* ,u* , f* , S'^*) satisfy (i) and 
(xi) of [MPl] and objective function value of (IT^ and ([Tl)) are equal. Similarly, 
/* is an optimal solution of linear program ([TT]) for fixed g* and hence there 
exists optimal solution (z*, S^*) of such that {z*,x*,S^*) satisfy (ii) and 
(x) of [MPl] and objective function value of ([TT]) and ([T^ are equal. In other 
words we have a point rj* = (u*, u*^, z*^, /*^, cc*^, (5^*)'^, (^^*)^) such that 
(i), (ii), (x) and (xi) are satisfied and 

f*^C^x* = \L,z* - {5^*fe. 
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Thus, 77* is a feasible point of the mathematical program [MPl] and from the 
construction of the objective function, ^{rj*) — 0. 

Let r] be any feasible point of [MPl]. Multiply each constraint in {ii) of 
[MPl] corresponding to pair (s, a^) by /(s, a^) and then add over all (s, a^) G K} 
and by using the constraints (w), {vii), (viii) and (x) we have 

fC'x>lJl.^z-{SYe- (15) 

By using the similar arguments as above, i.e., multiply each constraint in (i) 
of [MPl] corresponding to pair (s, a^) by x{s, a^) and add over all (s, a?) e K? 
and by using the constraints (mj), (w), {vi), (ix) and (xi), we have 

/^C2a:>t;-(<5^)^e'. (16) 

We have from ([13 and (UHl), <?(?y) > for all feasible points ?? of [MPl]. Thus 
f]* is a global minimum of the [MPl]. 

{b) Let 7]* be a global minimum of [MPl] such that ^(r/*) = 0. As 77* is 
a feasible point of [MPl] then (fT5|) and ([T5)) will also hold for 77*, i.e., 

r^c^x* >v*- is^*fe- 

From above, both the terms of objective function are non-negative at rj* but 
the objective function value is zero at rj* which means both the terms are 
individually zero, i.e.. 




(17) 



Fix rj* , and from the same argument used as in (jl5p and by using the con- 
straints (u), (uii), (viii), (x) and (I17|) we have the following inequality 

f'^C^x* < fC^x*, Vl-feasible {f,x*), 

Similarly we have 

f*^C^x* < f*^C^x, V 2-feasible (f*,x) 

In other words we can say that 

Clir,9l < Clif,gn, V l-feasible(/,g*) 
Clir,gn < Clir,g), V2-feasible(r,g), 

where 

^ ^ = V x*(s a^) 

for all s G S, G A'^{s) whenever the denominator is non-zero (when it is 
zero g*{s) is chosen arbitrarily from p{A'^{s))). This implies that {f*,g*) is 
a Nash equilibrium of the constrained stochastic game G"^ with average cost 
criterion. 
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Remark 1 Because the diagonal elements of the objective function's Hessian 
matrix are zero, it will have some positive as well as some negative eigenvalues. 
So, the objective function of [MPl] is a non-convex function. As there are some 
non-convex constraints, the feasible region is also not a convex set. So, [MPl] 
is a non-convex constrained optimization problem. 

2.1.3 Special cases 

We consider two special cases. First, we consider nonzero sum game as defined 
in Section [5] with average cost criterion where the constraints of player 2 do 
not depend on the strategies of player 1. Next, we briefly consider the zero 
sum game as considered in [B]. 

(i) Quadratic program in the case of decoupled constraints 

We consider the situation where the constraints of player 2 do not depend on 
the strategies of the player 1. This is possible when the immediate costs of 
player 2 which correspond to the constraints of player 2 do not depend on the 
actions of player 1, i.e., 

d^^\s,a^,a^) = d^^^is,a^), y s e S,a^ e A\s),a'^ e A^{s) and V / = 1, 2, ■ • • ,n2. 

(18) 

Under this condition [MPl] reduces to the quadratic program [QPl] given 
below: 

[QPl] min [[fC'x - (lf,|Z - (SYe)) + {fC'x - {v - (SYe)) 
s.t. 

"2 

{i)v + uis) < [{fis)fC\s)]^,+Y,Sfd''\s,a') 

1=1 

+ p{s'\s,a^)u{s'), y seS, £ A^{s) 
s'es 

(ti) z{s) < [C\s)x{s)]^, +Y,5ldl-^bi^,a'), y seS.a'e A\s) 

k=l 

{Hi) J2 [S{s,s') -p{s'\s,a^)]x{s,a'^) ^0, y s' e S 
(iv) ^ x{s,a'^) = l 

(vi) d''\s,a')x{s,a')<il y 1 = 1,2,- ■■ ,n2 

(s,a2)G/C2 
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(viii) f{s,a^) > 0, y seS, e A\s) 

(ix) a;(s,a2) > 0, V s e S*, e A^i-s) 

(x) (5fc > 0, V fc = 1,2,-- - ,ni 

(xi) <5f > 0, V ; = 1,2,--- ,n2. 



('iij Zero sum single controller constrained stochastic games 

The zero sum single controller constrained stochastic game with average cost 
criterion is considered in |B] . We assume that player 1 minimizes the expected 
average cost of the game and player 2 has opposite objective, i.e., he maxi- 
mizes the expected average cost of the game. In [B], the player who controls 
the transition probabilities has realization based constraints and other player 
has no constraints and these games can be solved by a linear program. By 
substituting C^(s) = — C^(s) = C(s) for all s G S and without the sub- 
scription type constraints, the quadratic program [QPl] can be reduced into 
primal-dual pair of linear programs which are same as given in [6] . 

2.2 Single controller constrained stochastic game with discounted cost 
criterion 

In this section we consider the game described in Section [5] with discounted 
cost criterion where both players choose their strategies independently and 
minimize their expected discounted costs as defined in ([3]) subject to their 
constraints from ([5]) , (|1]) . The constraints of player 1 given in ([S]) are subscrip- 
tion based. The expected discounted constraints (U) of player 2 captures the 
fact that discounted cost for the consumption of resource /, I — 1, 2, • • • ,712, 
by player 2 is not more than given (^f . Similar to the average cost criterion we 
give one mathematical program which characterizes stationary Nash equilibria 
of these games. 

2.2.1 Discounted occupation measure 

For an initial distribution 7 and a stationary strategy g define the discounted 
occupation measure 

For all s G S, a"^ e A'^{s), 7r^(7, g; s, a^) is given by 

7T}{j,g;s,a') = (1 - /3) ^ 7(^0 ([^(s)]*),, J 9is,a'), (19) 

\t=o s'es / 
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here [P((7)]'^ is the identity matrix. 7r|(7, g) can be considered as a probabihty 
measure over K? that assigns probabihty 7r|(7, s, a^) to the state-action pair 
(s,a^). For fixed strategy pair (/, 5) € x Gs the expected discounted costs 
of both players can be written in terms of occupation measure as 



for all Z = 1, 2, ■ • • , n2- 

Let (3^(7) be the set of vectors x € R''^ ' satisfying 

'(z) J2 {Sis,s')-(3p{s'\s,a^))xi.s,a')^{l~l3)^is'), V 5' G 5 
< (s,a2)eK;2 

^ (m) x(s, a^) > 0, y seS, e A^{s). 

By summing the first constraint over s' we note that ^(-^ a'^)£ic'^ 2;(s, a^) = 1, so 
the X satisfying the above constraints are probability measures. The stationary 
strategies are complete, i.e., set of occupation measures achieved by history 
dependent strategies equals to those achieved by stationary strategies and 
further equals to the set Q^{'y) [lO]- It is known that for each (s,a^) G K.^, 
x{s, a^) = 7r|(7, g; s, a^) where 

9{sy)^ ^ "^^"'"'l (20) 

whenever denominator is nonzero (when it is zero g{s) is chosen arbitrarily 
from p(A2(s))) PU;. 

The cost of player 1 when he uses action at state s and player 2 uses 
strategy g is given by 

c^{s,a^;g)^ Y c^(s,a\a^)7r^(7,g;s,a^). 

(s,a2)GK;2 

Similarly, the costs of player 2 when he uses action at state s and player 1 
uses strategy / are given by 

cP\f-s,a^)= d?'\s,a\a^)f{s,a^), V / = 1, 2, • • • , ^2. 
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2.2.2 Mathematical programming formulation 

Similar to average cost criterion we show the one to one correspondence be- 
tween the stationary Nash equihbria of this class of game and the global min- 
ima of a certain mathematical program. 



Be.st response linear programs 



For fixed strategy g of player 2, the best response of player 1 can be obtained 
from the following linear program: 



s.t. 



(s,ai)eKi 

W E dl^^,i.s,a')fis,a')<ek, V fc = l,2,. 

(s,ai)G/Ci 

(m) fis,a^) > 0, y seS, e A\s). 



,ni 



(21) 



The dual of ^ is 

" "1 



max 

2, 51 



.sGS 



fc=l 



s.t. 



(i) z{.s) < c\.s,a';g) +J^6ldli',{.s,a'), V ,s G 5, a' e A'{.s) 



k=l 



} (22) 



(m) 6l>0, V fc = 1,2,-- - 



Similarly for fixed strategy / of player 1, the best response of player 2 can be 
obtained from the following linear program: 

min } c^{f;s,a'^)x{s,a'^) 

(s,o2)e/C2 



s.t. 



(m) E rf^''(/;s,a2)x(s,a2) < V ? = 1,2,--- ,712 

(s,a2)e/C2 

(m) a;(s,a2) > 0, V s G 5, G ^^(s). 



(23) 
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If X* is the optimal solution of the linear program (|23D then the best response 
strategy g* of player 2 can be obtained from (EU)) [20j . The dual of the linear 
program (p3)l is given by 



max 

u 52 



^(l-/3)7(.M.)-£<5fe 



s.t. 



(=1 

+ /3 ^ p(s'|s, a2)u(s'), V s e S*, e ^2(4 



s'es 

(a) Sf >0, V / = 1,2,--- ,n2. 

(24) 

By using the best response linear programs (21]), (22), (221), (EH) we have similar 
results as in the case of average cost criterion. 

Theorem 2 (a) If{f*Tg*) is a Nash equilibrium of the constrained stochastic 
game C with discounted cost criterion, then, there exists a vector rj* = 
(u*-^, z*"^, Z*"^, x*"^, ((5"'^*)'^, ((5^*)"^) such that it is a global minimum of 
mathematical program [MP2] given below 



[MP2] min 
1 



s.t. 



f^c'x-[ij,^z^isYe)) 

+ (rc\-((l-/3)7^u-(<5Ta) 



"2 

{^) uis) < [{fis)fC\s)]^,+J25f [{f{s)fD'^\s) 

1=1 

+ 13J2 P(^'\^^ a^Ms'), V s e S*, e A^is) 
s'es 

(u) z{s) < [C\s)x{.s)]^,+J2sld]'^\{.s,a'), y s e S, a' e A\s) 



k=l 



(m) J2 [Sis,s')-Pp{s'\s,a^)]x{s,a^)^{l~P)j{s'), y s' eS 

(s,a2)GA:2 

M E dl'^,is,a')f{s,a')<ek, ^ k^l,2,--- ,m 

(s,ai)e/Ci 



sGS 



(m) J2 = ^ 

(vii) f{s,a^) > 0, V s e 5, fli e Ai(s) 
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{viii) x{s,a'^) > 0, V s G S*, G A^{s) 
(ix) Sl>0, V fc = 1,2,-- - ,ni 

{x) Sf >0, V / = 1,2,--- ,712. 

wif/i <?(77*) = 0. 

(b) Ifrj* ^ (S^*)^ , (S^*)'^)^ is a global minimum of [MP2] 

with = 0, then, {f*,g*) is a Nash equilibrium of the constrained 

stochastic game with discounted cost criterion, where 



for all s G S, G A'^{s) whenever the denominator is non-zero (when it 
is zero g*{s) is chosen arbitrarily from p(v4^(s))/ 

Proof We can prove this by using the best response hnear programs (|2T|) . (|22p. 
, and with similar argument given in the proof of Theorem [T] 

Remark 2 Similar to [MPl], [MP2] is also a non-convex constrained optimiza- 
tion problem. 

Remark 3 Both [MPl] and [MP2] can be obtained from single mathematical 
program [MP4] given in Appendix (A). 

2.2.3 Special cases 

We consider two special cases. First, we consider nonzero sum game as defined 
in Section [5] with discounted cost criterion where the constraints of player 2 
do not depend on the strategies of the player 1. Next, we briefly consider the 
zero sum game as considered in [1]. 

(i) Quadratic program in case of decoupled constraints 

When the constraints of player 2 do not depend on the strategies of player 
1, i.e., under condition the mathematical program [MP2] reduces to a 
quadratic program [QP2] given below 




a- 



2eA2(s) 




[QP2] min 



^fTc^^-{ij,^z-{6re)) 
+ {f^c'x-{{i-ph^u-isTe)) 



s.t. 



W uis) < [ifis)fC\s)]^,+Y,Sfd^''{s,a^) 



+ V seS, e A^{s) 
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(n) z{s) < [C\s)xis)]^,+Y,Sldl'^\{s,a'),ys€ S,a' € A\s) 

fc=i 

(m) [S{s,s')-l3p{s'\s,a'^)]x{s,a^) = {l-/3)-f{s'), y s' €S 

{iv) dl'^f^{s,a'^)f{s,a'^)<S,l, V A: 1, 2, • • • , ni 

(v) Y d^'^is,a'^)x{s,a'^) <^f, y I = 1,2,--- ,n2 

{vii) fis,a^) > 0, y seS, e A\s) 
(viii) x{s,a'^) > 0, V s e 5, £ yl^(s) 
(ia;) (5^ > 0, V fc = 1,2,--- ,ni 

(x) Sf >0, y 1=1,2, ■■■ ,712. 



('iij Zero sum single controller constrained stochastic games 

The zero sum single controller constrained stochastic game with discounted 
cost criterion is considered in [T] . In [T] , the first player has subscription based 
constraints and second player has realization based constraints which do not 
depend on the strategies of first player and these games can be solved by a 
linear program. Setting C^{s) = — C^(s) = C(s) for all s G S* the quadratic 
program [QP2] can be separated into primal-dual pair of linear programs which 
are same as given in 



2.3 A Numerical Example 

We give one numerical example where immediate costs of player 2 correspond- 
ing to his constraints do not depend on the actions of player 1. We compute the 
Nash equilibrium of this game by solving corresponding quadratic program. 
The components of the stochastic game are 

1. The state space S = {1,2}. 

2. The action sets of both the players are A^{s) = {1,2}, i = 1,2, s = 1,2. 

3. The immediate costs of both the players that defines their expected cost 
which they want to minimize and transition probabilities of the game are 



given in the Table 1(a) and 1(b) 
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Table 1 Immediate costs and transition probabilities 

(a) s = 1 (b) s = 2 




In both the tables above, the entry in upper triangle in each box gives the 
transition probabilities and the entry in lower triangle gives the immediate 
cost of both the players corresponding to the actions chosen by both the 
players in that state. For example, at state 1 when both the player choose 
action 1, then, player 1 gets immediate cost 5 and player 2 gets 4 and 
this is represented by entry (5,4) and game will remain in state 1 with 
probability ^ and it can move to state 2 with probability ^ and this is 
represented by entry (i, i) in the table corresponding to state 1. It is easy 
to check that transition probabilities given in tables above satisfies the 
ergodicity assumption (Al). 

Both the players have one constraint, i.e., player 1 has one subscription 
based constraint and player 2 has one realization based constraint. The 
subscription cost of player 1 and immediate cost of player 2 corresponding 



to each state- action pair are given in Table 2(a) and 2(b) respectively. 



Table 2 Costs defining constraints 
(a) dl^^is,a^) 



(b) d'^{s,a^) 





s = 1 


s = 2 


ai =1 


2 


3 


= 2 


3 


1 





s = l 


s = 2 




1 


4 


= 2 


2 


5 



5. The bound defining constraints are = 4,^^ = 2.5. 

From Table 1(a) and 1(b) it is clear that the game is controlled by player 2. 

(i) For average cost criterion we solve the quadratic program [QPl] corre- 
sponding to the above data, by using MATLAB and obtain 

77* = (3.0278, 4.1667, 2.833, 3.8667, 1.3067, 0.6944, 0.3056, 0.3472, 0.6528, 
0.2667, 0.36, 0.3733, 0, 0.1867, 0). 

Note that at 77* the objective function value is zero and hence it is the global 
minimum of quadratic program. We have a;*(l, 1) = 0.2667, a;* (1, 2) — 0.36, 
x*(2,l) = 0.3733, a;*(2,2) = 0. From 1^ we have g*(l,l) = 0.4256, 
ff*(l,2) = 0.5744, 5*(2,1) = 1, 5*(2,2) = 0. From Theorem^fe) the Nash 
equilibrium of constrained stochastic game defined above with average cost 
criterion is 



/* = ((0.6944, 0.3056), (0.3472, 0.6528)), 



((0.4256, 0.5744), (1,0)) 
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and the average costs of both the players at Nash equihbrium {f *,g*) are 

Cia(r,5*) = 4.4268 
CHr, g*) = 3.0279. 

(ii) For discounted cost criterion we take /3 = 0.5, 7 = (0.5,0.5). We solve 
the quadratic program [QP2] corresponding to the above data, by using 
MATLAB and obtain 

77* (10.2222, 10.8888, 3.5833, 1.4583, 1, 0, 0.5, 0.5, 0.3333, 0.25, 0.4167, 0, 

0.2083,0.9444). 

Note that at 77* the objective function is zero and hence it is the global 
minimum of quadratic program. We have a;*(l, 1) = 0.3333, x*{l, 2) = 0.25, 
a;*(2,l) = 0.4167, a;* (2, 2) = 0. From ^ we have g*(l,l) = 0.5714, 
g*{l,2) = 0.4286, 5*(2,1) = 1, 5*(2,2) = 0. From Theorem[2i;&) the Nash 
equilibrium of constrained stochastic game defined above with discounted 
cost criterion is 

/* = ((1, 0), (0.5, 0.5)) , g* = ((0.5714, 0.4286), (1, 0)) 

and the discounted costs of both the players at Nash equilibrium {f*,g*) 
are 

C^(7,r,5*)= 4.2082 
C|(7,r, 5*) =2.9166. 

3 Constrained stochastic game with independent state processes 

In this section we consider a iV-player constrained stochastic game with inde- 
pendent state processes as discussed in |51 . In these games each player controls 
his own Markov chain, whose transition probabilities do not depend on the 
states and actions of other players. In these games at any time, each player 
has information only about current and past states of his Markov chain as 
well as of his previous actions and does not have any information about the 
states and actions of other players. However, each player wants to minimize 
his expected average cost that depends on the strategies of all the players. The 
expected average constraints of each player also depend on the strategies of 
all the players. These games come under the class of decentralized stochastic 
games. 

The game is described by the tuple (5*, 7*, A', c% ^*) , i = 1, 2, • • ■ ,N, 
where: 

(i) is the finite state space of player i, i = I,-- - ,A^. The generic ele- 
ment of 5"' is denoted by s\ Define, S:=x^^^S^ and S~'':=X j^iS^ (x 
stands for the product space). The element of S is denoted by s where 
s = (s"'^, s^, • • • , s^) and s~* £ S*"' denote the vector of states s^, j ^ i. 
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(ii) 7' is the probability distribution for the initial state of player i, i = 
I,-- - ,N. We assume that the initial states of all the players are inde- 
pendent. Denote 7 = (7^, 7^, • • • , 7^)- 

(iii) is the finite action (strategy) set of player i and its clement is denoted 
by a*, i = I,-- - ,N. A'(s*) denotes the set of all actions of player i at 
state s' and A' = Us-es^ A'(s'). We denote o = (a^, a^, ■ • ■ , a^) and a~' 
as vector of actions a^, j ^ i. 

(iv) Define, K.' = {{s\a')\s' G S\a' G A\s')}, i = 1,2, ••• ,N &ndlC=xfLilC\ 

(v) : AC ^ M is immediate cost of player i, i = 1, • • • , A'^. Specifically, c*(s, a) 
is the immediate cost incurred by player i, i = 1,2,-- - ,A^, when state of 
players is (s^, s^, ■ • ■ , s^) and actions chosen by them are {a}, a?, • ■ • , a^) 
respectively. Each player i, i = 1,2, ■ ■ ■ , N, wants to minimize the expected 
average cost involving c*(-, •)• 

(vi) d' = ■ • • , rf''"*) , where d'-*^ : /C ^ M for all fc = 1, 2, • ■ • , are 
immediate costs of player i, i = 1, ■ ■ ■ , N . These ■) arc involved in 
the kth constraint, k = 1, 2, • • • , rij, on expected average cost of player i, 
i = l,---,N. 

(vii) : /C* — > piS^) is the transition probability of player i, i = I.-- - ,N, 
where p^{s^\s^,a^) is the probability that the state of player i moves from 
state s' to if he chooses action a' e A'(s'). 

(viii) = (CijG)"' ) the bounds defining the constraints of player i, 
i=l,---,N. 

The game dynamics arc as follows. Initially, at time t = state of the 
game is s = (s^, s^, • • • , s^) where € is chosen according to independent 
random variables 7', i = 1,2,- ■■ ,N. Players independently choose actions 
a = (a^,a^,-- - ,a^) with a* € A'(s*), i = 1,2, ••• ,A''. Player i obtains an 
immediate cost c*(s,a), i = 1,2,-- - ,N. Apart from this cost, player i, i = 
1, 2, • • • ,N, also receives another n, costs a)}, k = 1, 2, • ■ • , rij. These 

{(f '''{■, •)}, fc = 1, 2, ■ • • , rii, are involved in the expected average cost function- 
als of player i which are constrained by specified bounds {^^}, fc = 1, • • • , rij. 
Now, the state of player i switches to a new state s' at time t = 1 with prob- 
ability 0,*). i = ,N. At time t = 1, in state s', player i then 
independently chooses an action a% receives costs d{s,a) and {d^''^{s,d)}, 
k = 1, - ■ ■ ,ni and i = 1, - ■ ■ ,N. The next state for this player is P with prob- 
ability p*(s*|s*, a*). The dynamics of the Markov chains repeat at new state 
s = (s^, • • • , s^) and game continues for infinite time horizon. 

While transition probabilities depend only on the present state and action 
used, actions that are used can depend on 'past', as in history dependent 
strategies. Define a history of player z, z = 1, 2, • • • , iV, at time t as hi = 
{slals{,a\,--- ,sj_i,aj_i,sj) where sj € S\ aj € A\sl), i = l,2,--- ,N, 
t = 0, 1, 2, ■ • • . Let HI denote the set of all possible histories of length t of player 
i. Each player observes his own history and does not have any information 
about the other player's history. A decision rule fl : HI — > p{A^{s\)) of player 
i at time Hs a function which assigns to each history of length t of player i, a 
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probability measure over action set of player i. This means that under decision 
rule fl player i chooses action a' with probability fl{hl,a^). The sequence of 
decision rules is called the strategy of the player. Let = (/q, /i, • ■ • ,/(,••■) 
denote the strategy of player i, i — 1,2, ■ ■ ■ , N , and is called history dependent 
(behavioral) strategy. Note that the strategies of players do not depend on the 
realizations of the costs. If strategies were allowed to depend on such costs, 
then a player could use the costs to estimate the state and actions of the other 
players. 

Let denote the set of all history dependent strategies of player i and 
F —X^^F^ he the class of history dependent multi-strategies. These strategies 
are called Markovian if at every decision epoch the decision rule depends only 
on the current state but the decision rule can differ at every epoch. A stationary 
strategy is a Markovian strategy which is independent of the time, i.e., at every 
decision epoch the decision rule is same. So, for stationary strategy fl = P 
for all t, i.e., (/',/%/%■•■) is a stationary strategy of player i. We denote, 
with some abuse of notations, P as the stationary strategy of player i. Let ^"5; 
denote the set of all stationary strategies of player i and Fs=y.'^^^Fsi denote 
the class of stationary multi-strategies. For, i — 1, 2, • • • ,N , stationary strat- 

egy P e Fs. is identified with P - {{p , {P {2))'^ , ■ ■ ■ , {p {\SWf . 
where p{s^) = {p{s\l), P{s\2), ■ ■ ■ , p {s\\A^{s')\))'^ for aU e S\ For 
all e S"^ , p{s'^, a*) is then, the probability of choosing action a* G A*(s*) by 
player i, i — 1, - ■ ■ ,N. For G we denote as the vector of strategies 
j i, and for any g*'' G we define (/"''', 17''') to be the multi-strategy, 
where, for j ^ i, player j uses f^^ and player i uses g'''. Under mild assump- 
tions, which we also make, Altman, et al [2] show that a Nash equilibrium 
exists for the above constrained stochastic game within the class of stationary 
strategies. 

This leads to the introduction of vector stochastic process {Xt, At}'^Q, 
whereXt = {X^,Xf,--- ,X^),At = {Al,Al--- , Af) , X^Menote the state of 
the player i and AJ denote the action chosen by player i at time t, t = 0,1, ■ ■ ■ . 
An initial distribution 7 together with multi-strategy ^ F defines a unique 
probability measure P^^ on an appropriate probability space with respect to 
which the laws of vector stochastic process {Xt,A(}^g of states and actions 
can be defined. The expectation operator on this probability space is denoted 
byE},. 

The expected average costs 

These costs are average functionals of states and actions of all the players 
and each player minimizes his cost functionals. For given initial distribution 7 
and multi-strategy the expected average cost of player i,i = l,2,---,7Vis 
defined as 
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The expected average constraints 

The constraints of each player are defined by average functionals of states and 
actions of all the players which are bounded by given reals. For given initial 
distribution 7 and multi-strategy f'^ the expected average costs of player i, 
i — 1,2, ■ ■ ■ N are defined as 

1 

Dl'ail, = limsup - V nud'^'^iXuKt), V = 1, 2, • • • , n,. 

^eai'j ') can capture the average consumption of resource k, k = 1,2, ■■■ ,n.i, 
by player i, i — 1,2, ■ ■ ■ , N . The constraints of player i, i — 1, 2, • • • ,N are 
given as 

Dlih,f)<a, VA; = 1,2,..- (26) 

The constraints (|26p captures the fact that average consumption of resource k 
by player i, i — 1,2, ■ ■ ■ , N , when player i uses strategy and other players 
use is not more than given A: = 1, 2, ■ • ■ , n^. 

All the players choose their strategies independently and want to mini- 
mize their expected average cost from ([^5]) subject to their constraints from 
(PS)) . We denote this constrained stochastic game by C^^. The multi-strategy 

= {f^'^,P'^,--- ,f^^) is called i-feasible if it satisfies ith player's con- 
straints from (PSI) and it is called feasible if it is i-feasible for every i = 
1,2,- •• ,N. Let denote the set of all feasible history dependent multi- 
strategies and f| denote the set of all stationary feasible multi-strategies for 
the constrained stochastic game G%^. We shall assume throughout that Fg is 
non-empty. 

We now recall the definition of Nash equilibrium as given in [2] . A multi- 
strategy /''* G F^ is called the Nash equilibrium of the constrained stochas- 
tic game Gg^, if for each player i = 1,2, •• • ,N and for any such that 
(/*^, is i-feasible, one has that 

cL(7,/''*)<c:j7,r",r"'*). 

Thus, unilateral deviation by any player i, i = 1, ■ ■ ■ , N from equilibrium 
strategy /'** is not possible, because in that case, either at least one of his 
constraints will be violated or it will result in a cost for player i that is not 
lower than the one achieved by feasible equilibrium strategy /''* . A stationary 
multi-strategy /* G Fg is said to be Nash equilibrium of constrained stochastic 
game G%^, if for each player i = 1, 2, • • • , and for any /* such that (/*, /^") 
is i-feasible, one has that 

This can be seen by noticing that when all players j , j i, &x: their strategy 
as a stationary strategy then player i is faced with a constrained Markov 
decision process (CMDP) where optimal strategy always exists in the space of 
stationary strategies [20]. 



28 



Vikas Vikram Singh, N. Hcmachandra 



Assumptions [Altman, et al. J^/ 

As similar to [5] we also have the following assumptions: 

(Al) Ergodicity: For each player i — 1, ■ ■ ■ , N, and for any stationary strategy 
/' the state process of player i is an irreducible Markov chain with one 
ergodic class (and possibly some transient states). 

(A2) Strong Slater condition: Every player i, i = 1, ■ ■ ■ ,N has some strategy 
such that for any multi-strategy of other players, 



(A3) The players do not observe their costs, i.e., the strategy chosen by any 
player does not depend on the realization of the cost. 

The last assumption is due to the definition of the strategies. If strategies were 
allowed to depend on the realization of the costs, then a player can use the 
cost to estimate other player's states and actions. As the Nash equilibrium 
exists in stationary strategies under the assumptions (A1)-(A3) [2], from now 
onwards we restrict ourselves to the class of stationary strategies. 

3.1 Average occupation measure 

For each player i, i = 1, 2, • ■ • ,N, using a stationary strategy and initial 
distribution 7* define the average occupation measure as 



where tt^ = yjr^ (l):""^ (2),-'- ,7''^ (l'S'*|)j is the unique steady state distri- 
bution of Markov chain induced by strategy /* of player i, which exists under 
(Al). Trla{Y,P) can be considered as probability measure over /C' that as- 
signs probability i^l-ail^ i 0.^) to state-action pair (s*, a*). The occupation 
measure defined in (|27p is unique and independent from initial distribution 
7*, so, we drop 7' from the notation. For any multi-strategy / G -Fs the ex- 
pected average costs for each player i, i — \^2, ■ ■ ■ N , can be written in terms 
of occupation measure as 



For all 



e 5*, a* e A'(s*), 7rg^(7*, s*, a*) is given by 
7rl,ij\f\s\a^) = 7Tf\s^)r{s\a^) 



(27) 



N 



s^,a^) c*(s,a). 



{s,a)elC j = i 



N 



s^a^■) d''^s,a), V fc = 1, 2, • • • , n,. 
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Let Qla, i — 1,2, ■ ■ ■ , N , he the set of vectors G M''^ ' satisfying 
J2 {Sis\s')-p\s'\s\a'))x'is\a') = 0, "is'eS' 

(s\a*)e/C* 

The space of stationary strategies is complete, i.e., the set of occupation mea- 
sures achieved by history dependent strategies equals to those achieved by 
stationary strategies and further equals to the set Qea; * = li 2, • • ■ ,N [20]. It 
is known that for each (s%a*) £ /C', x*(s*,a') = 7rg^(/*; s*, a*) where 



x'{s\a') 



(28) 



whenever denominator is nonzero (when it is zero /'(s*) is chosen arbitrarily 
from p{A'{s'))) [20 . 

We use the following notations throughout this section. For i = 1,2, ■ ■ ■ , N, 

• u^^{u^{l),u^{2),--- ,u^{\S'\)f. 

• v' e M. 

. x^ ^ {{x\l)f ,{x\2)f ,{xmS^\))^y . 
. x'{s') = {x'{s\ l),x'{s\ 2), • • • , x' {s\ \A'{s')\))^. 

The costs of player i, i = 1,2, ■ ■ ■ , N, when he uses action a* at state and 
other players use /~* is defined as in [3, 



{s.a)~'^^K~ 



N 



N 



n ^L(/'■;s^a^■) 



c*(s,a). 

d'''^(s,a), Vfc = 1,2, 



• • • , 



3.2 Mathematical programming formulation 

We show the one to one correspondence between the stationary Nash equilibria 
of this game and the global minima of one mathematical program. 

Best response linear programs 

The best response of each player i, i = 1, 2, • ■ • , N, against fixed stationary 
strategy of other players is given by solving a constrained Markov decision 
model, which, in turn, can be obtained by a linear program in our setting [20) . 
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The best response of player i against fixed strategy / * of other players is 
given by the linear program below: 

min y c'{f~';s\a')x'is\a') 



) (29) 



s.t. 

{iv) x\s\a') > 0, Vs' e S\a' e A\s') 

If x" is the optimal solution of the linear program then, the best response 
/" of player i can be obtained from p8[) [20] . The dual of linear program ((29|) 
is 



max 



fc=i 



s.t. 



k=l 

(m) 4 > 0, Vfc = 1,2,-- - ,nj. 

(30) 

By using N primal-dual pair of linear programs given by (1291) , (j30p , we show 
the one to one correspondence between the stationary Nash equilibria of con- 
strained stochastic game Gg^ and global minima of a mathematical program 
[MP3]. Let := {v\ {u')^ , {x')^ , {S')'^)^^^ and V(C) denote the decision 
variables and the objective function of [MPS] respectively. is a 1 x (iV + 

J2f=i + Ej^i Es^eS' l^'(s')l + Ej^i n^) dimensional vector. 

Theorem S (a) If {f")fLi ^ Nash equilibrium of the constrained stochastic 

game G^^, then, there exists a vector C*^ = (w**)^, (a:**)"^, (<^")'^)^i 
such that it is a global minimum of mathematical program [MPS] given 
below 



N 



[MPS] miuE 

^ i=i 

s.t. 



N 



E n «') cx^^, «) - - E -^fc^i 

(s,a)e/C = 1 / \ k=l 
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N 



k=l 



N 



J2 n 



(s,a) *G/C 



(m) Y1 {S{s\s')-p\s'\s\a'))x\s\a') = 0,ys'eS\i = l,2,---, 
(m) ^ x\s\a') ^ I, y i:^ 1,2,- ■■ ,N 

M E [Y[x' (.s',a^)]d'-'(.s,ci)< a, = 1,2,- ■■ ,n,,i = 1,2,- ■■ 

{s,a)£K \j=l J 

(w) x\s\ a') > 0, V e S'\ a' e A''(s*), i = 1, 2, ■ • • , 
(m) (5^ > 0, V fc = 1, 2, • • • , n„ i = 1, 2, • • • , Af. 
wit/i ?/;(C*) = 0. 

(b) IfC'^ = (w", (u")^, (x")"^, is a global minimum of [MPS] wif/i 

V'(C*) = i/ien {f")fLi is a Nash equilibrium of the constrained stochastic 
game G^^j where, 

x"{s\a') 



N 



,N 



f [s ,a) 



for all s* £ 5*% a' G A*(s*), i = 1,2, •• • ,N whenever the denominator is 
non-zero (when it is zero f"{s^) is chosen arbitrarily from p(A*(s*))J. 

Proof (a) Let {f")fLi be a Nash equilibrium of the constrained stochastic 
game G^^. For each i = 1,2,-- - ,iV, we construct occupation measures x" 
as in (P7)) corresponding to stationary strategies Then, the constraints 



IS 



in {ii),{iii) and (w) are satisfied by {x")fLi. The muhi-strategy (/ 
feasible because it is a Nash equilibrium, so the constraints in (iv) are also 
satisfied by {x")fLi. For each i — 1,2, - ■■ N, f" is best response of player i 
against fixed strategy /~" of other players; so, x" as constructed above will 
be optimal solution of linear program (I^H)) for this fixed f~" from Proposition 
3.1(m) of 2 . From strong duality theorem [51], [5S] there exist optimal solution 
{v",u",S") of ((30)) such that the constraints in (i) and (vi) of [MPS] are 
satisfied by {v" , u" , x" , S" )fLi and objective function value of ([29]) and ([30]) 

are same. In other words we have a point C*^ = {v" , {u"Y , {x"Y , {6"Y)^_^ 
which is feasible for [MPS] and 



N 



E \\lx^*{s\a')\c\s,a) 

{s,a)£K. \j=l 



Y,^l*a, Vz = l,2,.-. ,7V. 



fc=l 
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From the construction of the objective function, tp{C*) — 0. 

Let C be any feasible point of [MP3]. For each i = 1, 2, • • • ,N, multiply 
each constraint in (i) of [MPS] corresponding to pair a') e /C* by x^{s^, a'), 
add over all (s*,a*) G /C* and by then using the constraints {ii)-{vi) we have 

\{x={s^,a^)\c\s,a)>v^-Y^5lQ,, V z = 1, 2, • • • , TV. (31) 

i=i / fc=i 

From dnj) we have V(C) > for all feasible points C of [MPS]. Thus C* is a 
global minimum of the [MP3]. 

(6) Let C* be a global minimum of [MP3] such that V'(C*) = 0. As (* is 
a feasible point of [MPS] then (pij) will also hold for C*, i.e., 

(JV \ Hi 

na;^'*(s^a^) c\s,a)>v" -Y,5l*Ck. V z = 1, 2, ■ • • , iV. 
i=l / A:=l 

From above we see that all iV terms of the objective function are non-negative 
at (■*• But at the objective function value is zero which means that all the 
terms are individually zero, i.e., 

\{x^*{s^,a^)\c\s,a) = v"- -Y^SirSX, V * - 1, 2, • • • , TV. (32) 
i=i / fe=i 

Fix C* and for each i = 1, 2, • • • , TV, multiply each constraint in (?) correspond- 
ing to pair (s', a') G IQ by a;*(s% a') and add over all (s% a*) G A]]' and by using 
the constraints {ii)-{vi) and ((5^ we have for each i = 1, 2, • • • , 

^ I ]^xJ*(s^aJ) I c'(s,a) < ^ x\s\a'){ J| x^*(s^ a^) ] c'(s, a) 

(s,a)e/C \i=l / (s,a)eK \i=l:j¥« 

for all i-feasible (a;*,a;~**) . In other words we can say that for each i - 
l,2,-..,iV 

Clain < Cla{f\ /'"), V z-feasible (/', /-"). 

That is (/")fli is Nash equilibrium of the constrained stochastic game Gg 
where 

■ - x"(s'- 
J (s ,a) - 



for all s* G S\ a* G yl' (s*), i — 1,2, ■ ■ ■ , N whenever the denominator is 
non-zero (when it is zero /"(s*) is chosen arbitrarily from p(A*(s*))). 

Remark 4 It is easy to see that [MPS] is also a non-convex constrained opti- 
mization problem. 
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3.2.1 Special cases 

We consider two special cases. First, we consider two player nonzero sum 
constrained stochastic game as defined in Section |3l where, the constraints of 
both the players are decoupled. Next, we consider two player zero sum game 
as considered in [fj. 



(i) The case of two player game with decoupled constraints 

Here we consider the situation where there are only two players and the con- 
straints of each player do not depend on the strategies of the other player. 
This is possible when immediate costs of each player which correspond to his 
constraints do not depend on the state and actions of the other player, i.e., 
d''''{s\s^,a\a^) = d''^{s\a') for aU s' S S\ a' € A'{s'), k = 1,2, ■■■ ,n,,i = 
1,2. We see that the mathematical program [MPS] reduces to a quadratic 
program [QP3] as given below 



[QP3] min^ 

i=l 

S.t. 



(si,ai,s2,a2) \j=l J \ k=l 



(s^,a^}eK:^ k=l 



(si,ai)G/Ci fc=l 

+ p\s^\s^a^W{s^), ys^ES^, a^EA^s^) 
(lit) Y {Ks\s')-p'is'\s\a'))x\s\a')^0, ys'ES\i = l,2 
{iv) J2 x\s\a') = 1, Vi = 1,2 

{v) Y d'^''{s\a')x\s\a') <Ci, V k^l, 2,- ■■ ,n^, i = 1,2 

(ix) x'{s\a') > 0, Vs'E S\a' E A'{s'), i = l,2 
{x) dl>0, \/ k = 1,2,- ■■ ,ni, i = 1,2. 
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(ii) Zero sum constrained stochastic game ^ 

As a further special case of constrained stochastic game Gg^, we consider two 
player zero sum game with decoupled constraints [7] . This class of games with 
both unichain and multichain structure on the state processes of both the 
players can be solved by linear programs [7]. For zero sum case simply set 
cHs\s^,a\a^) - -c^{s\s\a\a^) = c{s\s^a\a^) for all G S\ s^ £ 

£ A^{s^), G A^(s^) then the quadratic program [QP3] can be separated 
into a primal-dual pair of linear programs which are same as given in [7] in 
unichain case. 



3.3 A numerical example 

In this section we give one numerical example of a two player game where 
constraints of both the players are decoupled. We compute the Nash equilib- 
rium of this game by solving quadratic program [QP3]. The components of 
the stochastic game are: 

1. The state space of player 1 and player 2 are — {1,2}, S*^ — {3,4} 
respectively. 

2. The action sets of player 1 are A^{s^) = {1, 2} for all G and action 
sets of player 2 are ^^(s^) = {1, 2} for all G S'^. 

3. The immediate costs of both the players, which are involved in their ex- 
pected average costs they want to minimize, are given in Tables [3(a)|[3(b)| 



3(c) and |3(d)[ These tables summarize the immediate costs of both the 



Table 3 Immediate costs 



(a) {s\s^) = (1,3) 





a2 = 1 


a2 = 2 


a' = 1 


(2,3) 


(3,1) 


= 2 


(4,2) 


(2,4) 


(c) (sl,.2) = (2,3) 




a2 = 1 


= 2 


ai = 1 


(3,5) 


(4,6) 


= 2 


(5,2) 


(2,1) 



(b) (.1,^2) = (1,4) 



ai 


a2 = 1 


a2 = 2 


= 1 


(5,2) 


(3,4) 


= 2 


(3,2) 


(4,1) 


(d) (.l,.2) = (2,4) 




a2 = 1 


a2 = 2 


al = l 


(4,5) 


(3,1) 


= 2 


(1,2) 


(4,3) 



players in all the possible states. For example in Table 3(a) the entry (2, 3) 
represent 2 as immediate cost of player 1 when first player is in state 1 and 
he chooses action 1 and second player is in state 3 and chooses action 1. 
Similar explanation is for 3 and other entries in all the tables. 
The transition probabilities of first and second Markov chains (one for each 
player) are given in the Tables 4(a) and |4(b)] respectively. We can easily 
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check that both the Markov chains are unichain. In first Markov chain state 
1 is transient for some strategies of player 1 and state 2 is recurrent for 
every strategy of player 1. In the second Markov chain both the states 
3 and 4 are recurrent for every strategy of player 2. So, the assumption 
(Al) is satisfied. 



Table 4 Transition probabilities of both the Markov chains 
(a) p\.\s\a^) 



(b) p^(.\s\a^) 





= 1 


=2 




(0.5,0.5) 


(0.33,0.67) 


3^=2 


(1=0) 


(0,1) 





= 1 


= 2 


= 3 


(0.67,0.33) 


(0.4,0.6) 


= 4 


(0.25,0.75) 


(1,0) 



5. Each player has one constraint. The immediate costs of both the players 
which are used in their expected average constraints are given in Table 
5(a)] and [5(b)] 



Table 5 Immediate costs defining constraints 
(a) d^(s^,a^) 





a^ = l 


=2 




7 


4 


s^ = 2 


2 


5 



(b) d2(s2^a2) 





a^ = l 


= 2 


= 3 


4 


3 


= 4 


3 


5 



6. The bounds defining the constraints arc = 5, = 3.5. 

We solve the quadratic program [QP3], corresponding to the above data, by 
using MATLAB and obtain 

C = (1.2941, 0, 0, 1.7059, -0.5882, 0.5882, 0, 0, 0, 1, 0, 0.2941, 0.7059, 0, 0, 0). 

Note that at C* the objective function value is zero and hence it is the global 
minimum of of [QP3]. We have x^* = (0,0, 0,1) and x^* = (0,0.2941,0.7059,0) 
then from Theorem[3](6) the Nash equilibrium (/^*, /^*) of constrained stochas- 
tic game G^^, where 

/I* = ((a, 1 - a), (0, 1)) for ah a e [0, 1] 

/2* = ((0,1),(1,0)) 

Note that under /^* player 1 can use any randomized strategy at state 1 which 
comes from the fact that state 1 is transient under /^*. The costs of both the 
players at Nash equilibrium (/^*,/^*) are 

C^ea = 1-2941 

Cel(/'*,/'*) = 1.7059. 
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Appendix A 

A single mathematical program for average and discounted cost criteria model 

The mathematical programs [MPl] and [MP2] that chara<;terize the stationary Nash 
equilibria of single controller constrained stochastic games with average and discounted cost 
criteria respectively can be recovered from one mathematical program [MP4] given below. 

[MP4] mm [[f^C'x - (l^ z - [6')^^)) + {FCx - (v + {1 - 0)'r'^u - (S'fe))] 

s.t. 

{i)v + uis) < [{mfcHs)]^^+^5f [(/(s))^r>2.'(s)]^^ 

+ /3 ^ p(s'\s,a^)u(s'}, V s e S, e a^(s) 
3'es 

(ii) z(s) < [cHs)x(s)]^, + E^fc'^'ut V s G S, al G A\s) 
fc=l 

(m) E [S(s,s')-l3p(s'\s,a^)]x(s,a^) = {l-0)'y{s'), W s' eS 
(iv) E x{s,a'^) = l 

E '^L1(^.»')/(^.»')<€^ V fe = l,2,... ,ni 

(s,al)SK;l 

(vi) ^{f(s)fD^-\s)x{s)<^f, V i = l,2,... ,n2 

sSS 

(vii) J2 = V SG5 

<ilgAl{s) 

(viii) f{s,a^) > 0, V s G S, G A^{s) 

(ix) x{s,a^) > 0, V s G 5, G ^^(s) 

(x) Sl>0, V A; = 1,2,--- ,ni 

(xi) Sf >0, V / = 1,2, ■•• ,712. 

The mathematical program [MPl] can be obtained by putting /3 = 1 in [MP4]. For discount 
factor /3 G [0, 1) the constraint (iv) of [MP4] is redundant because it can be obtained by 
summing (iii) over all s' £ S and hence the variable v is also redundant. So, by removing 
constraint (iv) and variable v from [MP4] we obtain [MP2]. 



